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(57) ABSTRACT 

A system and method for estimating statistics concerning 
system metrics to provide for the accurate and efficient 
monitoring of one or more computer systems. The system 
preferably comprises a distributed computing environment, 
i.e., an enterprise, which comprises a plurality of intercon- 
nected computer systems. At least one of the computer 
systems is an agent computer system which includes agent 
software and/or system software for the collection of data 
relating to one or more metrics, i.e., measurements of system 
resources. Metric data is continually collected over the 
course of a measurement interval, regularly placed into a 
registry of metrics, and then periodically sampled from the 
registry indirectly. Sampling-related uncertainty and inac- 
curacy arise from two primary sources: the unsampled 
residual segments of seen (i.e., sampled and therefore 
known) events, and unseen (i.e., unsampled and therefore 
unknown) events. The total unsampled utilization and the 
total unseen utilization are accurately estimated according to 
the properties of one or more process service time distribu- 
tions. The total unseen utilization is also estimated with an 
iterative method using gradations of the sample interval. The 
length distribution of the unseen processes is determined 
with the same iterative method. 

75 Claims, 18 Drawing Sheets 
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ENTERPRISE MANAGEMENT SYSTEM AND 
METHOD WHICH INCLUDES STATISTICAL 
RECREATION OF SYSTEM RESOURCE 
USAGE FOR MORE ACCURATE 
MONITORING, PREDICTION, AND 
PERFORMANCE WORKLOAD 
CHARACTERIZATION 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to the collection, analysis, 
and management of system resource data in distributed or 
enterprise computer systems, and particularly to the more 
accurate monitoring of the state of a computer system and 
more accurate prediction of system performance. 

2. Description of the Related Art 

The data processing resources of business organizations 
are increasingly taking the form of a distributed computing 
environment in which data and processing are dispersed 
over a network comprising many interconnected, 
heterogeneous, geographically remote computers. Such a 
computing environment is commonly referred to as an 
enterprise computing environment, or simply an enterprise. 
Managers of the enterprise often employ software packages 
known as enterprise management systems to monitor, 
analyze, and manage the resources of the enterprise. Enter- 
prise management systems may provide for the collection of 
measurements, or metrics, concerning the resources of indi- 
vidual systems. For example, an enterprise management 
system might include a software agent on an individual 
computer system for the monitoring of particular resources 
such as CPU usage or disk access. U.S. Pat. No. 5,655,081 
discloses one example of an enterprise management system. 

In a sophisticated enterprise management system, tools 
for the analysis, modeling, planning, and prediction of 
system resource utilization are useful for assuring the sat- 
isfactory performance of one or more computer systems in 
the enterprise. Examples of such analysis and modeling 
tools are the "ANALYZE" and "PREDICT" components of 
"BEST/1 FOR DISTRIBUTED SYSTEMS" available from 
BMC Software, Inc. Such tools usually require the input of 
periodic measurements of the usage of resources such as 
central processing units (CPUs), memory, hard disks, net- 
work bandwidth, and the like. To ensure accurate analysis 
and modeling, therefore, the collection of accurate perfor- 
mance data is critical. 

Many modern operating systems, including "WINDOWS 
NT* and UNIX, are capable of recording and maintaining an 
enormous amount of performance data and other data con- 
cerning the state of the hardware and software of a computer 
system. Such data collection is a key step for any system 
performance analysis and prediction. The operating system 
or system software collects raw performance data, usually at 
a high frequency, stores the data in a registry of metrics, and 
then periodically updates the data. In most cases, metric data 
is not used directly, but is instead sampled from the registry. 
Sampling at a high frequency, however, can consume sub- 
stantial system resources such as CPU cycles, storage space, 
and I/O bandwidth. Therefore, it is impractical to sample the 
data at a high frequency. On the other hand, infrequent 
sampling cannot capture the complete system state: for 
example, significant short-lived events and/or processes can 
be missed altogether. Infrequent sampling may therefore 
distort a model of a system's performance. The degree to 
which the sampled data reliably reflects the raw data deter- 
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mines the usefulness of the performance model for system 
capacity planning. The degree of reliability also determines 
the usefulness of the performance statistics presented to 
end-users by performance tools. 

s Sensitivity to sampling frequency varies among data 
types. Performance data can be classified into three catego- 
ries: cumulative, transient, and constant. Cumulative data is 
data that accumulates over time. For example, a system CPU 
time counter may collect the total number of seconds that a 

10 processor has spent in system state since system boot. With 
transient data, old data is replaced by new data. For example, 
the amount of free memory is a transient metric which is 
updated periodically to reflect the amount of memory not in 
use. However, values such as the mean, variance, and 

15 standard deviation can be computed based on a sampling 
history of the transient metric. The third type of performance 
data, constant data, does not change over the measurement 
interval or lifetime of the event. For example, system 
configuration information, process ID, and process start time 

20 are generally constant values. 

Of the three data types, transient performance metrics are 
the most sensitive to variations in the sample interval and are 
therefore the most likely to be characterized by uncertainty. 
For example, with infrequent sampling, some state changes 

25 may be missed completely. However, cumulative data may 
also be rendered uncertain by infrequent sampling, espe- 
cially with regard to the variance of such a metric. Clearly, 
then, uncertainty of data caused by infrequent sampling can 
cause serious problems in performance modeling. Therefore, 

30 the goal is to use sampling to capture the essence of the 
system state with a sufficient degree of certainty. 
Nevertheless, frequent sampling is usually not a viable 
option because of the heavy resource usage involved. 

35 For the foregoing reasons, there is a need for data col- 
lection and analysis tools and methods that accurately and 
efficiently reflect system resource usage at a lower sampling 
frequency. 

SUMMARY OF THE INVENTION 

40 

The present invention is directed to a system and method 
that meet the needs for more accurate and efficient moni- 
toring and prediction of computer system performance. In 
the preferred embodiment, the system and method are used 

45 in a distributed computing environment, i.e., an enterprise. 
The enterprise comprises a plurality of computer systems, or 
nodes, which are interconnected through a network. At least 
one of the computer systems is a monitor computer system 
from which a user may monitor the nodes of the enterprise. 

50 At least one of the computer systems is an agent computer 
system. An agent computer system includes agent software 
and/or system software that permits the collection of data 
relating to one or more metrics, i.e., measurements of system 
resources on the agent computer system. In the preferred 

55 embodiment, metric data is continually collected at a high 
frequency over the course of a measurement interval and 
placed into a registry of metrics. The metric data is not used 
directly but rather is routinely sampled at a constant sample 
interval from the registry of metrics. Because sampling uses 

60 substantial system resources, sampling is preferably per- 
formed at a lesser frequency than the frequency of collec- 
tion. 

Sampled metric data can be used to build performance 
models for analysis and capacity planning. However, less 
65 frequent sampling can result in inaccurate models and data 
uncertainty, especially regarding the duration of events or 
processes and the number of events or processes. The 
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present invention is directed to reducing said uncertainty. 
Uncertainty arises from two primary sources: the unsampled 
segment of a seen process or event, and the unseen process 
or event. A seen process is a process that is sampled at least 
once; therefore, its existence and starting time are known, s 
However, the residual time or utilization between the last 
sampling of the process or event and the death of the process 
or the termination of the event is unsampled and unknown. 
An unseen process is shorter than the sample interval and is 
not sampled at all, and therefore its entire utilization is 10 
unknown. Nevertheless, the total unsampled (i.e., residual) 
utilization and the total unseen utilization can be estimated 
with the system and method of the present invention. 

In determining the total unsampled utilization, a quantity 
of process service time distributions are determined, and 15 
each of the seen processes are assigned respective process 
service time distributions. For each distribution, a mean 
residual time is calculated using equations provided by the 
system and method. The total unsampled utilization is the 
sum of the mean residual time multiplied by the number of 20 
seen processes for each distribution, all divided by the 
measurement interval. 

In determining the total unseen utilization, first the total 
captured utilization is determined to be the sum of the 
sampled utilizations of all seen processes over the measure- 25 
ment interval. Next the total measured utilization, or the 
"actual" utilization over the measurement interval, is 
obtained from the system software or monitoring software. 
The difference between the total measured utilization and 
the total captured utilization is the uncertainty. Because the 30 
uncertainty is due to either unsampled segments or unseen 
events, the total unseen utilization is calculated to be the 
uncertainty (the total measured utilization minus the total 
captured utilization) minus the total unsampled utilization. 

When the total measured utilization is not available, the 35 
total unseen utilization is estimated with an iterative bucket 
method. A matrix of buckets are created, wherein each row 
corresponds to the sample interval and each bucket to a 
gradation of the sample interval. Each process is placed into 
the appropriate bucket according to how many times it was 40 
sampled and when in the sample interval it began. Starting 
with the bucket with the longest process(es) and working 
iteratively back through the other buckets, the number of 
unseen processes are estimated for each length gradation of 
the sample interval. The iterative bucket method is also used 45 
to determine a length distribution of unseen processes. 

In response to the determination of utilizations described 
above, the system and method are able to use this informa- 
tion in modeling and/or analyzing the enterprise. In various 5Q 
embodiments, the modeling and/or analyzing may further 
comprise one of more of the following: displaying the 
determinations to a user, predicting future performance, 
graphing a performance prediction, generating reports, ask- 
ing a user for further data, permitting a user to modify a 55 
model of the enterprise, and altering a configuration of the 
enterprise in response to the determinations. 

BRIEF DESCRIPTION OF THE DRAWINGS 

A better understanding of the present invention can be 6Q 
obtained when the following detailed description of the 
preferred embodiment is considered in conjunction with the 
following drawings, in which: 

FIG. 1 is a network diagram of an illustrative enterprise 
computing environment; 65 

FIG. 2 is an illustration of a typical computer system with 
computer software programs; 
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FIG. 3 is a block diagram illustrating an overview of the 
enterprise management system according to the preferred 
embodiment of the present invention; 

FIG. 4 is a block diagram illustrating an overview of the 
Monitor component of the enterprise management system 
according to the preferred embodiment of the present inven- 
tion; 

FIG. 5 is a block diagram illustrating an overview of the 
Agent component of the enterprise management system 
according to the preferred embodiment of the present inven- 
tion; 

FIG. 6 is a block diagram illustrating an overview of the 
Analyze component of the enterprise management system 
according to the preferred embodiment of the present inven- 
tion; 

FIG. 7 is a block diagram illustrating an overview of the 
Predict component of the enterprise management system 
according to the preferred embodiment of the present inven- 
tion; 

FIG. 8 is a flowchart illustrating an overview of the 
collection and sampling of metric data; 

FIG. 9 is a diagram illustrating an unsampled segment of 
a seen event; 

FIG. 10 is a diagram illustrating an unseen event; 

FIG. 11 is a flowchart illustrating an overview of the 
estimation of metric data statistics; 

FIG. 12 is a flowchart illustrating the determination of the 
total uncaptured utilization; 

FIG. 13 is a flowchart further illustrating the determina- 
tion of the total uncaptured utilization; 

FIG. 14 is a flowchart illustrating the determination of the 
portion of the total uncaptured utilization for an exponential 
distribution; 

FIG. IS is a flowchart illustrating the determination of the 
portion of the total uncaptured utilization for a uniform 
distribution; 

FIG. 16 is a flowchart illustrating the determination of the 
portion of the total uncaptured utilization for an unknown 
distribution; 

FIG. 17 is a flowchart illustrating an alternative method of 
the determination of the portion of the total uncaptured 
utilization for an unknown distribution; 

FIG. 18 is a flowchart illustrating the determination of the 
total unseen utilization; 

FIG. 19 illustrates a matrix of buckets used in the esti- 
mation of the total unseen utilization; 

FIG. 20 illustrates a specific example of the estimation of 
the total unseen utilization with buckets; 

FIG. 21 is a flowchart illustrating the iterative bucket 
method of estimating the total unseen utilization; 

FIGS. 22 and 23 are equations which are used to generate 
a length distribution of the unseen processes. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENT 

U.S. Pat. No. 5,655,081 titled "System for Monitoring and 
Managing Computer Resources and Applications Across a 
Distributed Environment Using an Intelligent. Autonomous 
Agent Architecture" is hereby incorporated by reference as 
though fully and completely set forth herein. 

U.S. Pat. No. 5,761,091 titled "Method and System for 
Reducing the Errors in the Measurements of Resource 
Usage in Computer System Processes and Analyzing Pro- 
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cess Data with Subsystem Data" is hereby incorporated by programs 160 and a typical computer system 150. Each 

reference as though fully and completely set forth herein. computer system 150 typically comprises components such 

FIG. 1 illustrates an enterprise computing environment as a cpu 152 > an associated memory media. The 

according to one embodiment of the present invention. An memory media stores program instructions of the computer 

enterprise 100 comprises a plurality of computer systems 5 programs 160, wherein the program instructions are execut- 

wbich are interconnected through one or more networks. ab ! e b y the CPU 152 ' ™ e memory media preferably com- 

Although one particular embodiment is shown in FIG. 1, the pnses a syste 1 m memory such as RAM and/or a nonvolatJe 

enterprise 100 may comprise a variety of heterogeneous * ^ C u° mpUter ^ "° 

computer systems and networks which are interconnected in * r ' her a dev >f suc ° f a ™ n f ' * 54 > a " 

a variety of ways and which run a variety of software io ^P^umeric input device such as a keyboard 156 and 

applications optionally a directional input device such as a mouse 158. 

*! ' , The computer system 150 is operable to execute computer 

One or more local area networks (LANs) 104 may be programs 160 

included in the enterprise 100. ALAN 104 is a network that lin . tU , . j 

w . , v „ ry-i • <| r AVT . When the computer programs are executed on one or 

spans a relatively small area. Typically, a LAN 104 is * . « 
v c , 4 t^ l i5 more computer systems 150, an enterprise management 

confined to a single building or group of buildings. Each „ 4 a on • ui * •* i j *i_ 

, t , j - \ t axt system 180 is operable to monitor, analyze, and manage the 

node (i.e., individual computer system or device) on a LAN „ t , . f( , b , 

+ A* e ft i ™tt ,. , ■ computer programs, processes, and resources of the enter- 

104 preferably has its own CPU with which it executes ■ inn r? u * , ieft ■ tU , . -,« A 

y i , , . , , pnse 100. Each computer system 150 in the enterprise 100 

programs, and each node is also able to access data and i i-i r * i- 

^ ? . * y 4a j f~r<f r »vt . executes or runs a plurality of software applications or 

devices anywhere on the LAN 104. The LAN 104 thus c . * V 

J , . , . v „ in processes. Each software application or process consumes a 

allows many users to share devices (e.g.. printers) as well as £ * * 

, 4 4 _/ „. rr, T KjJAm t . portion of the resources of a computer system and/or net- 

data stored on file servers. The LAN 104 may be charac- WQrk . fof k ^ ti / > m such as 

tenzM by any ot a variety ot types ot topology (i* the ram, nonvolatile memory such as a hard disk, network 

geometric arrangement of devices on the network), of pro- bandwidth and (I/0 ). The enterprise manage- 

tocolsO-e- the rules ;and encoding specifications for sending men( g lg „ ^ ^ tQ moni ^ ^ 

data, and whether the network uses a peer-to-peer or client/ e resource on heterogen eous computer systems 

server architecture), and of media (e g., twisted-pair wire, 150 acfoss the ent ^ 100 . 

coaxial cables, fiber optic cables, radio waves). As lllus- - . . - ^ 

trated in FIG. 1, the enterprise 100 includes one LAN 104. F , IG - * *?™ " ^ ml " a 8 em , e " t 

However, in alternate embodiments the enterprise 100 may ^ e " l * , 0 ' ™ e enterprise managemen system 180 

include a plurality of LANs 104 which are coupled to one 30 mc \^ n a ' one con ?f no f e ™ a ° d at le f on f ^ 

another through a wide area network (WAN) 102. A WAN n °^ 300 ' ^ttf may include a plurabty of console nodes 400 

1M . * 1 *i. * i ** ii i_- i and/or a plurality of agent nodes 300. In general, an agent 

102 is a network that spans a relatively large geographical , * ° A „ A & A . , ' & . 4 

area r ^ © © o r node 3QQ executes software to collect metric data on its 

* , T . , „ . computer system 150, and a console node 400 executes 

Each LAN 104 comprises a plurabty of interconnected 35 software to monitorj ^ and man the collecled 

computer systems and optional y one or more other devices: metrics fro[n Qne 0f more t Qodes m A metfic ^ a 

for example, one or more workstaUons 110a, one or more mea surement of a particular system resource. For example, 

personal computers 112a, one or more laptop or notebook k ^ eferred embodiment) the cn t cr p risc management 

computer systems 114, one or more server computer systems system lg0 collects metrics such as CpTJj ^ r/0> file 

116, and one ^or more network printers 118. As illustrated in 4Q tem database ^ threads> rocesscs> kcrael> 

FIG. 1, the I^N1(M comprises one of each of computer registry, logical volumes, and paging. Each computer system 

t^akmL l } 2a > H 4, /° d aDd ° De PrmtCr U8 - ^ 150 in the enterprise 100 may comprise a console node 400, 

LAN 104 may be coupled to other computer systems and/or an t node 3Q0 Qr both a COQSole node ^ ^ m 

other devices and/or other LANs 104 through a WAN 102. node 300 in thc prcferred embodiment> sawt mm ^ t 
One or more mainframe computer systems 120 may 45 systems include agent nodes 300, and other computer sys- 
optionally be coupled to the enterprise 100. As shown in terns may also comprise agent nodes 300 as desired, e.g., file 
FIG. 1, the mainframe 120 is coupled to the enterprise 100 servers, print servers, e-mail servers, and internet servers, 
through the WAN 102, but alternatively one or more main- The console node 400 and agent node 300 are characterized 
frames 120 may be coupled to the enterprise 100 through b y an end-by-end relationship: a single console node 400 
one or more LANs 104. As shown, the mainframe 120 is 50 may be linked to a single agent node 300, or a single console 
coupled to a storage device or file server 124 and mainframe no d e 400 may be linked to a plurality of agent nodes 300, or 
terminals 122a, 122fc, and 122c. The mainframe terminals a plurality of console nodes 400 may be linked to a single 
122a, 1226, and 122c access data stored in the storage agent node 300, or a plurality of console nodes 400 may be 
device or file server 124 coupled to or comprised in the linked to a plurality of agent nodes 300. 
mainframe computer system 120. 5S In the preferred embodiment, the console node 400 com- 
The enterprise 100 may also comprise one or more prises four user-visible components: a Monitor component 
computer systems which are connected to the enterprise 100 402, a Collect graphical user interface (GUI) 404, an Ana- 
through the WAN 102: as illustrated, a workstation 1106 and l yze component 406, and a Predict component 408. In one 
a personal computer 112ft. In other words, the enterprise 100 embodiment, all four components 402, 404, 406, and 408 of 
may optionally include one or more computer systems 60 the console node 400 are part of the "BEST/1 FOR DIS- 
which are not coupled to the enterprise 100 through a LAN TRIBUTED SYSTEMS" software package or the 
104. For example, the enterprise 100 may include computer "PATROL" software package, all available from BMC 
systems which are geographically remote and connected to Software, Inc. The agent node 300 comprises an Agent 302, 
the enterprise 100 through the Internet. one or more data collectors 304, Universal Data Repository 
The present invention preferably comprises computer 65 (UDR) history files 210a, and Universal Data Format (UDF) 
programs 160 stored on or accessible to each computer history files 212a. In alternate embodiments, the agent node 
system in the enterprise 100. FIG. 2 illustrates computer 300 includes either of UDR 210a or UDF 212a, but not both. 
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The Monitor component 402 allows a user to monitor, in 
real-time, data that is being collected by an Agent 302 and 
being sent to the Monitor 402. The Collect GUI 404 is 
employed to schedule data collection on an agent node 302. 
The Analyze component 406 takes historical data from a 
UDR 210a and/or UDF 212a to create a model of the 
enterprise 100. The Predict component 408 takes the model 
from the Analyze component 406 and allows a user to alter 
the model by specifying hypothetical changes to the enter- 
prise 100. Analyze 406 and Predict 408 can create output in 
a format which can be understood and displayed by a 
Visualizer tool 410. In the preferred embodiment, Visualizer 
410 is the "BEST/1 -VISUALIZER" available from BMC 
Software, Inc. In one embodiment, Visualizer 410 is also 
part of the console node 400. 

The Agent 302 controls data collection on a particular 
computer system and reports the data in real time to one or 
more Monitors 402. In the preferred embodiment, the Agent 
302 is the part of the "BEST/1 FOR DISTRIBUTED SYS- 
TEMS'' software package available from BMC Software, 
Inc. The data collectors 304 collect data from various 
processes and subsystems of the agent node 300. The Agent 
302 sends real-time data to the UDR 210a, which is a 
database of historical data in a particular data format. The 
UDF 212a is similar to the UDR 210a, but the UDF 212a 
uses an alternative data format and is written directly by the 
data collectors 304. 

FIG. 4 shows an overview of the Monitor component 402 
of the console node 400 of the enterprise management 
system 180. The Monitor 402 comprises a Manager Daemon 
430, one or more Monitor Consoles (as illustrated, 420a and 
4206), and a Policy Registration Queue 440. Although two 
Monitor Consoles 420a and 4206 are shown in FIG. 4, the 
present invention contemplates that one or more Monitor 
Consoles may be executing on any of one or more console 
nodes 400. 

In the preferred embodiment, the Monitor Consoles 420a 
and 4206 use a graphical user interface (GUI) for user input 
and information display. Preferably, the Monitor Consoles 
420a and 4206 are capable of sending several different types 
of requests to an Agent 302, including: alert requests, update 
requests, graph requests, and drilldown requests. An alert 
request specifies one or more thresholds to be checked on a 
routine basis by the Agent 302 to detect a problem on the 
agent node 300. For example, an alert request might ask the 
Agent 302 to report to the Monitor Console 420a whenever 
usage of a particular software process exceeds a particular 
threshold relative to overall CPU usage on the agent node 
300. An update request is a request for the status of the Agent 
302. For example, the requested status information might 
include the version number of the Agent 302 or the presence 
of any alarms in the Agent 302. A graph request is a request 
to receive graph data, i.e., data on a metric as routinely 
collected by the Agent 302, and to receive the data in real 
time, i.e., whenever it becomes available from the present 
time onward. By obtaining and displaying graph data, the 
Monitor Console 420a enables the rapid identification and 
communication of potential application and system perfor- 
mance problems. Preferably, the Monitor Console 420a 
displays graph data in a graphical format. A drilldown 
request is a request to receive drilldown data, i.e., data on an 
entire metric group (a set of metrics) as collected by the 
Agent 302. By obtaining and displaying drilldown data, the 
Monitor Console 420a provides the ability to focus, in 
real-time, on a specific set of processes, sessions, or users. 
Preferably, the Monitor Console 420a displays drilldown 
data in a tabular format. 
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Whenever the Agent 302 generates an alarm to indicate a 
troublesome status on the agent node 300, the Manager 
Daemon 430 intercepts the alarm and feeds the alarm to one 
or more Monitor Consoles, such as 420a and 4206. 

5 Typically, an alarm is a notification that a particular thresh- 
old has been exceeded on a monitored process or subsystem 
on an agent node 300. The Manager Daemon 430 is capable 
of receiving alarms from a plurality of Agents 302. A 
Manager Daemon 430 is preferably always running on each 

10 console node 400 so that alarms can be captured even when 
the Monitor Consoles 420a and 4206 are offline. 

Each of the Monitor Consoles 420a and 4206 is able to 
issue one or more policies. A policy defines a disparate set 
of metrics to be collected on one or more agent nodes 300. 

1S In other words, a policy allows a Monitor Console 420a or 
4206 to monitor one or more metrics on one or more agent 
nodes 300 simultaneously. For example, a user could build 
and deploy a policy that restricts web browser access on a 
plurality of agent nodes 300 with the following set of 

20 interrelated conditions: "IF more than 80% of server CPU is 
required by critical production applications, AND the run 
queue length is greater than six, AND active time on 
production disks exceeds 40%." Policies are registered with 
the Policy Registration Queue 440, from which they are 

25 disseminated to the appropriate Agents 302. An Agent 302 
can execute a plurality of policies simultaneously. 

FIG. 5 shows an overview of the Agent component 302 of 
the agent node 300 of the enterprise management system 
180. In the preferred embodiment, every agent node 300 has 

30 one Agent 302. The Monitor Console 420c is another 
instance of the Monitor Consoles illustrated in FIG. 4 with 
reference numbers 420a and 4206. 

When the user desires to start an Agent 302 and begin 
collecting data on a particular agent node 300, the user 

35 operates the Monitor Console 420c to issue an agent start 
request through a Service Daemon 2026. Preferably, the 
Service Daemon 2026 is always executing on the agent node 
300 in order to intercept messages from one or more Monitor 
Consoles 420c even when the Agent 302 is offline. In the 

40 preferred embodiment, the Service Daemon 2026 is largely 
invisible to the user. The Service Daemon 2026 also inter- 
cepts agent version queries from the Monitor Console 420c. 
An agent version query is a request for the current version 
number of the piece of software that comprises the Agent 

45 302. As described above, the Monitor Console 420c is able 
to send alert requests, update requests, graph requests, and 
drilldown requests to the Agent 302. The Monitor Console 
420c may also send collection requests, which are requests 
for the Agent 302 to begin collecting particular metrics or 

50 metric groups on the agent node 300. 

When the Agent 302 receives a collect request from the 
Monitor Console 420c through the Service Daemon 2026, 
the Agent 302 initiates the collection through the Collect 
Registry Queue (CRQ) 340. The Agent 302 uses the Collect 

55 Registry Queue 340 to control and schedule data collection. 
By helping the Agent 302 know how many collectors 304 
are running and whether the collectors 304 are each the right 
type, the Collect Registry Queue 340 prevents redundant 
collection. Each data collector 310, 312, 314, 316, 318, and 

60 320 is designed to gather one or more metrics for the 
operating system and/or one or more subsystems. The 
present invention contemplates a variety of data collectors 
304, but for illustrative purposes, the following are shown: 
system data collector 310 (which collects data from the 

65 operating system), ARM data collector 312 (which collects 
data from ARMed applications 324), UMX data collector 
314 (which collects data from user scripts/programs 326), 
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Oracle data collector 316 (which collects data from an 
"ORACLE" database management system), Informix data 
collector 318 (which collects data from an "INFORMIX" 
database management system), and Sybase data collector 
320 (which collects data from a "SYBASE" database man- 
agement system). Each of the collectors 310, 312, 314, 316, 
318, and 320 has an associated input queue 322a, 322&, 
322c, 3224, 322e, and 322/, respectively. The input queues 
322a, 322£, 322c, 3224, 322c, and 322/ store the requested 
metric groups and associated collection intervals for each 
collector 304. Although a collector 304 typically supports 
multiple metric groups, the collector 304 only collects those 
metric groups that are requested. After metric data is 
collected, the data is transferred to a Metric Repository 350. 
The Metric Repository 350 sits between the Agent 302 and 
the collectors 304 and provides fast interprocess communi- 
cation between the Agent process. 302 and the collector 
processes 304. 

Metric data from the Metric Repository 350 is efficiently 
copied into the Metric Repository Pool 352, where the data 
is cached by metric group, instance, and collection rate. The 
Metric Repository Pool 352 is located in the memory space 
of the Agent 302 and is invisible to everything other than the 
Agent 302. By storing collected data for the metric groups 
in a single Metric Repository Pool 352 for each Agent 302 
and agent node 300, the enterprise management system 180 
prevents redundant collection: whether one Monitor Con- 
sole 420c or a plurality of Monitor Consoles such as 420a 
through 420c request data collection for a particular metric 
group, the data is only collected once. 

In the preferred embodiment, the Collect Registry Queue 
340, Metric Repository 350, Metric Repository Pool 352, 
input queues 322a, 322b, 322c, 3224, 322e, and 322/, and 
Universal Data Repository (UDR) history files 210a, 210b, 
210c, and 2104 comprise a data structure called a base queue 
or BASEQ. A BASEQ is a contiguous relocatable heap of 
memory: in other words, the BASEQ provides random 
allocation of data in a contiguous block of storage. The 
BASEQ provides fast interprocess communication with 
locking synchronization between the consumer of data and 
the provider of data. The BASEQ can be stored in different 
types of memory, such as volatile memory like RAM or 
nonvolatile memory like a hard disk. In the preferred 
embodiment, the BASEQ is implemented as a base class in 
an object-oriented programming environment. In this 
embodiment, specialized variants of the BASEQ are imple- 
mented as derived classes which inherit the properties of the 
base class. For example, UDR 210a, 210fc, 210c, and 2104 
are implemented with a derived class which is located on a 
file on disk, while Metric Repository 350 is implemented 
with a derived class which is located in a shared memory 
segment. 

In the preferred embodiment, the enterprise management 
system 180 provides for the storage of historical metric data 
as well as the monitoring of real-time metric data. Therefore, 
in addition to passing the metric data to the Monitor Console 
420c, the Agent may also send the metric data to a Remote 
Repository 360 for storage. The Remote Repository 360 is 
located on the agent node 300, and each agent node 300 may 
have its own Remote Repository 360. The Remote Reposi- 
tory comprises a database in the Universal Data Repository 
(UDR) format 2106 and/or a database in the Universal Data 
Format (UDF) format 2126. The UDF 2126 is an alternative 
data format to the UDR 2106 and is used primarily by older 
ones of the collectors 304. The UDR format 2106 is multi- 
node: it can store data from multiple sources in one place. 
UDR 2106 is also multi-rate: it can store data at a plurality 
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of levels of varying granularity by sending data at each 
successive level through an intelligent summarization pro- 
cess according to the present invention. Historical data can 
also be stored in a Central Repository 440 on the console 

5 node 400. A Service Daemon 202a controls the data transfer 
from the Remote Repository 360 to the Central Repository 
440. In the preferred embodiment, the Central Repository 
440 comprises a UDR 2104. 

FIG. 6 illustrates an overview of the Analyze component 

30 406 of the console node 400 of the enterprise management 
system 180. In the preferred embodiment, Analyze 406 
comprises the "ANALYZE" portion of the "BEST/1 FOR 
DISTRIBUTED SYSTEMS" software package available 
from BMC Software, Inc. Essentially, Analyze 406 takes the 

25 data collected by one or more Agents. 302 and creates a 
model of one or more computer systems and the processes 
that run on those computer systems. In the preferred 
embodiment, Analyze 106 can model multi-vendor 
environments, system memory, multiple processors, disk 

20 drives, logical volumes, RAID devices, load balancing, 
ASCII and X terminals, local and remote file servers, 
independent and dependent transactions, client/server 
workloads, private and shared memory/transaction, CPU 
priority scheduling, networks of different types, and 

25 "ORACLE", "SYBASE", and "INFORMIX" database envi- 
ronments. In the preferred embodiment, Analyze 406 takes 
as input a domain file 466 which identifies the agent nodes 
300 on the network and the relationship between them. As 
shown in FIG. 6, Analyze 406 also takes as input a data 

30 repository in either UDF 212c or UDR 210c format, wherein 
the data repository 212c or 210c is a set of metric groups 
collected from one or more agent nodes 300. 

The Analyze user then can either use a default workload 
specification (.an) 464 or create his or her own, either with 

35 the supplied graphical user interface (GUI) 460 or with a 
standard text editor 461. A workload specification 464 
includes a user name, a process name, and other information. 
A workload is a useful grouping of key performance metrics. 
For example, the user might classify a plurality of Oracle - 

40 related processes as an "Oracle" workload, a plurality of 
other processes as a "payroll" workload, and the remainder 
as a "miscellaneous" workload. From this classification data, 
the Analyze engine 406 creates an Analyze GUI file 462 
which contains a list of processes captured within the 

45 analysis interval. The Analyze GUI file 462 is then passed to 
the Analyze GUI 460. 

Using the Analyze GUI file 462, the domain file 466, and 
the UDF 212c or UDR 210c data repository, Analyze 406 
can create several forms of output. First, Analyze 406 can 

50 create a model file 468a. The model file 468a is a model of 
the workload data as contained in UDF 212c or UDR 210c 
and as classified by the user through the Analyze GUI 460 
and/or standard text editor 461. Second, Analyze 406 can 
create reports 472a, which comprise the results of user- 

55 specified queries concerning workload characteristics. For 
example, one instance of reports 472a could be a list of the 
top ten workloads sorted by total CPU usage. Third, Analyze 
406 can create a Visualizer file 470a, wherein the Visualizer 
file 470a is a description of the characteristics of the 

60 enterprise 100 as determined by the collected metrics and 
the user input. The Visualizer file 470a can be read and 
utilized by the Visualizer tool 410. In the preferred 
embodiment, Visualizer 410 is the "BEST/1 -VISUALIZER" 
available from BMC Software, Inc. With Visualizer 410, 

65 performance statistics and workloads can be graphed, 
compared, drilled down, and visually analyzed to pinpoint 
hot spots or trends to assist in resource management, system 
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tuning, and configuration changes. Visualizer 410 preferably 
includes functionality known as MASF (Multivariate Adap- 
tive Statistical Filtering).. Using standard deviation 
techniques, MASF continually interprets performance data 
and calculates normalcy. MASF graphs are thus used to 
discover true performance anomalies that deviate from nor- 
mal performance behavior. In addition to creating Visualizer 
file 470a and reports 472a, Analyze 406 also generates 
Model files 468a for performance prediction of the system 
within an enterprise computing environment 100. 

FIG. 7 shows an overview of the Predict component 408 
of the console node 400 of the enterprise management 
system 18.0. In the preferred embodiment, Predict 408 
comprises the "BEST/1-PREDICT" component of the 
"BEST/1 FOR DISTRIBUTED SYSTEMS" software pack- 
age available from BMC Software, Inc. Predict 408 is a 
planning tool which forecasts the impact of hypothetical 
changes on elements of the enterprise 100 such as disparate 
hardware, software, applications, and databases. Predict 408 
takes the workload data from a Model File 468c, such as the 
Model File 468a generated by Analyze 406, and computes 
performance statistics such as workload response times, 
utilization, and throughputs at CPUs, disks, networks, and 
other elements of the enterprise computing environment 
100. Thus, Predict 408 constructs a baseline model from 
collected data that represents the essence of the system 
under management. The user can also operate Predict 408 to 
construct the baseline model from pre-built model 
components, or from a combination of collected data and 
pre-built components. Preferably, Predict 408 uses a graphi- 
cal user interface (GUI) for user input and information 
display. 

After the baseline model has been constructed, the user 
can- modify the baseline model by specifying configuration 
corrections, configuration changes, and/or growth scenarios. 
With Predict 408, the user can change one or more attributes 
of any model, creating "what if?" or hypothetical scenarios. 
By using methods, modeling techniques, and statistical 
formulas taken from queuing theory, Predict 408 accurately 
determines the impact of these workload and configuration 
changes on performance and response time. As one of the 
results of "what if?" computation, the changes to the base- 
line are displayed as unitless, numerical response time 
values relative to the baseline value of one. In the preferred 
embodiment, response times are broken down into four key 
components: CPU service time and wait time, I/O service 
time and wait time, network service time and wait time, and 
wait time for transactions running on external systems. 
Using the four key components, Predict 408 also preferably 
calculates other critical performance metrics such as 50 
throughput rates, CPU queue lengths, disk queue lengths, 
paging rates, and the amount of memory required to elimi- 
nate excessive paging. 

Predict 408 preferably includes a multivendor hardware 
table 469, wherein the table includes the hardware specifi- 55 
cations that Predict 408 uses to calculate the performance of 
hypothetical changes to the enterprise 100. Therefore, 
changes to CPU, memory, I/O, priorities, transaction rates, 
and other attributes can be evaluated across a plurality of 
heterogeneous computer systems 150. Furthermore, in mod- 
eling the configuration and workload changes across mul- 
tiple systems, Predict 408 automatically calculates interac- 
tion and interference between systems. Predict 408 also 
preferably provides scenario planning, or modeling incre- 
mental growth over time, in order to determine the life 
expectancy of computing resources and the point at which 
resources should be upgraded to ensure that performance 
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remains at an acceptable level. In the various ways set forth 
above, Predict 408 thus permits a user to plan for the future 
by "test driving" both actual and alternative or hypothetical 
configurations of the enterprise, 100. 

Like Analyze 406, Predict 408 can generate reports 472b, 
a Visualizer file 4706, and a model file 468b. The model file 
468b can be modified and passed back to Predict 408 for 
additional modeling. 

Collecting, Sampling, and Statistically Recreating Metric 
Data 

Performance measurement is the process of gathering data 
concerning the state of the hardware and/or software of a 
computer system. In one embodiment, system software 
and/or data collectors 304 continually monitor one or more 
elements of the computer system and collect raw metric data 
relating to system performance, preferably at a high fre- 
quency. The metric data is written to a memory and peri- 
odically updated. The memory is preferably a registry of 
metrics. Often, different metrics are not updated at the same 
time or in the same interval. However, it is assumed that the 
raw data in the registry accurately reflects the system state 
of interest. 

In a preferred embodiment, data in the registry is not used 
directly. Rather, the data is periodically sampled from the 
registry of metrics indirectly through the process of second- 
hand sampling. Such second-hand sampling is preferably 
performed less frequently than the frequency at which data 
is collected and placed into the registry of metrics. Because 
second-hand sampling itself uses system resources such as 
I/O, storage space, and CPU time, it is impractical and 
inefficient to sample the registry of metrics at a very high 
frequency: that is, at a frequency nearing the usually high 
frequency at which raw data is written to the registry of 
metrics. On the other hand, if data is sampled from the 
registry too infrequently, then a model created with the 
second-hand data may not be as accurate as desired. For 
example, significant short-lived events and/or processes can 
be missed altogether if the interval between samples is too 
large. Infrequent sampling may therefore distort a model of 
a system's performance. The degree to which the sampled 
data reliably reflects the raw data determines the usefulness 
of the performance model for system capacity planning. The 
degree of reliability also determines the usefulness of the 
performance statistics presented to end-users by perfor- 
mance tools. 

Sensitivity to sampling frequency varies among data 
types. Generally, performance data can be classified into 
three categories: cumulative, transient, and constant. Cumu- 
lative data is data that accumulates over time. For example, 
a system CPU time counter may collect the total number of 
seconds that a processor has spent in system state since 
system boot. With transient data, old data is replaced by new 
data. For example, the amount of free memory is a transient 
metric which is updated periodically to reflect the amount of 
memory not in use. However, values such as the mean, 
variance, and standard deviation can be computed based on 
a sampling history of the transient metric. The third type of 
performance data, constant data, does not change over the 
measurement interval or lifetime of the event. For example, 
system configuration information, process ID, and process 
start time are generally constant values. Of the three data 
types, transient performance metrics are the most sensitive 
to variations in the sample interval and are therefore the 
most likely to be characterized by uncertainty. For example, 
with infrequent sampling, some state changes may be 
missed completely. However, cumulative data may also be 
rendered uncertain by infrequent sampling, especially with 
regard to the variance of such a metric. 
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The following table contains a nonexhaustive list of 
examples of major performance metrics and their data types. 
The table also presents guidelines as to how often the 
metrics should preferably be sampled. 



METRIC 


DATATYPE 


SAMPLE RATE (SEC) 


Disk queue length 


CumuJative 


5 to 15 


CPU queue length 


Cumulative 


5 to 15 


I/O counts 


Cumulative 


5 to 15 


Number of processes 


Cumulative 


5 to 15 


Memory in use 


Transient 


1 to 3 


Memory size 


Constant 


3600 


Disk busy time 


Transient 


1 to 3 


In (out) network packets 


Cumulative 


5 to 15 


Number of bytes in a packet 


Cumulative 


5 to 15 



15 



Performance data is collected according to a measurement 
structure, wherein the measurement structure comprises a 
measurement interval and sample interval or number of 20 
samples. The measurement interval or collection interval L 
is a continuous time segment during which raw performance 
data is collected. The measurement interval is delineated by 
its beginning time and its ending time. The sample interval 
A is the time between two consecutive samples. In the 25 
preferred embodiment, the sample interval is a constant 
value. The number of samples n is the total number of 
samples taken during the measurement interval. The rela- 
tionship among these three parameters is: 

The events being sampled may include, for example, 
process lifetimes, process types, or disk access times, or any 
other performance metrics that can be monitored. Although 
this description addresses in detail examples such as CPU 35 
utilization, process lifetime, and process type, the system 
and method can be applied to any metric. As used herein, 
"process" refers to an executing program, a task, a thread, or 
any other unit of execution. 

FIG. 8 is a flowchart illustrating an overview of the 40 
collection and sampling of metric data. In step 700 raw 
performance data is collected by system software or data 
collectors 304 at a high frequency. The raw performance 
data relates to one or more processes on one or more 
computer systems or networks. In step 702 the raw data 45 
points are stored and/or updated in the registry of metrics. As 
shown in step 704, the collecting and updating steps 700 and 
702, respectively, are performed for as long as the sample 
interval A has not expired. When the sample interval has 
expired, in step 706 the registry of metrics is sampled. The 50 
sampling creates a set of sampled data points. As shown in 
step 708, steps 700, 702, 704, and 706 are performed 
repetitively as long as the measurement interval L has not 
expired. When the measurement interval L has expired, the 
collection and sampling end. 55 

For performance modeling, two measurements are often 
key: the duration of an event (e.g., a process), or its service 
time; and the number of events, which is equivalent to the 
arrival rate times the length of the measurement interval. 
Reducing the uncertainty associated with these two key 60 
measurements is a goal of the system and method set forth 
in this description. If a process record is created at the time 
when the process is created and if the process is sampled at 
least once, then the exact starting time (i.e., birth) of the 
process can usually be obtained. Furthermore, if the ending 65 
time (i.e., death) of the process is also recorded and the 
record is sampled at least once, then the exact length of the 



process (i.e., the process lifetime) can be computed. In most 
systems, however, the ending time record is not kept, and 
therefore the exact ending time and length of the process are 
unknown. Therefore, other methods must be used to esti- 
mate the ending time and the process length. 

Uncertainty arises from two primary sources: the 
unsampled segment of a seen event or process, and the 
unseen, short-lived event or process. FIG. 9 is a diagram 
illustrating an unsampled segment of a seen event. The 
horizontal line designated "Time" indicates increasing time 
from left to right. The timeline encompasses all or part of the 
measurement interval L. The vertical lines labeled s (i7) _ 2 
through s ((m )+3 indicate samples taken at a constant sample 
interval A. The event or process 610 begins at the point in 
time b, and ends at the point in time d t -. The process 610 
begins after sample s (l7)-1 but before sample s^, so the 
process 610 is not detected at the point in time b,- when it 
begins. However, the process 610 is still executing when 
sample s t7 is taken, so the existence of this process 610 is 
known at that point. In other words, the process 610 is a seen 
process or a known process as soon as the first sample s l7 is 
taken. Furthermore, in a preferred embodiment, the starting 
time b ( - of the process 610 is also determined when the 
process 610 is detected at sample s t7 . After it has first been 
sampled, the process 610 continues executing for an indefi- 
nite period of time, as indicated in FIG. 9 by broken lines, 
wherein the process lifetime may or may not encompass 
additional samplings of the process 610 at regular sample 
intervals. The last sampling of the process 610, and therefore 
the last time the process 610 is seen, is the m^-th sample at 
the point in time s imr The present invention contemplates 
that a seen process may be sampled only once, and thus that 
s i/ oS im, m some cases. The process 610 stops executing at the 
point in time d,-, after s im> but prior to s (iw In the preferred 
embodiment, however, no record is kept of the termination 
of the process 610, and so the length of the process 610 after 
s^ is unknown. Therefore, the known, captured, or sampled 
length 612 of the seen process 610 is represented by the 
difference between s^ and b f . The unsampled or unknown 
length 614 of the seen process 610 is represented by the 
difference between d £ and s imf . The unsampled segment 614 
is also known as the residual process time. The captured 
utilization is the sampled length 612 divided by the mea- 
surement interval L. 

FIG. 10 is a diagram illustrating an unseen event. Again, 
the horizontal line designated "Time" indicates increasing 
time from left to right and encompasses all or part of the 
measurement interval L. The vertical lines labeled s 0 
through s 3 indicate samples taken at a constant sample 
interval A. The event or process 620 begins at the point in 
time b t and ends at the point in time d,-. In this instance, 
however, the process 620 begins and ends within the sample 
interval A and between two samples. Therefore, the process 
620 is unseen and its length is known only to be less than A. 
The unseen length 622 is represented by the difference 
between d ; and b,-. 

For a computer system or plurality of systems with 
hundreds or thousands of processes starting and ending 
within a measurement interval, the uncertainty adds up 
rapidly and can distort a performance model. However, 
statistical methods according to the present invention can 
provide estimations of the uncertain data, thus recreating the 
lost data and reducing uncertainty. FIG. 11 is a flowchart 
illustrating an overview of the statistical estimation of metric 
data. The difference between the "actual" total utilization 
and the "sampled" total utilization — in other words, the 
uncertainty — can be distributed both to the unsampled seg- 
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ments of the seen events or processes and to the unseen In step 756 the mean residual time r for this distribution 

events or processes. Accordingly, in step 720 of FIG. 11 the is determined according to the following equation: 

total uncap tured utilization U w< _ is estimated. U uc represents 

an estimate of the total unsampled utilization of all seen p\ 

processes over the measurement interval L. In step 722 the s ~ r = J o rdC *00. 

total unseen utilization is estimated. represents an 

estimate of the total utilization of all unseen processes for 

the measurement interval L. wherein A is the sample interval. With steps 750 through 

FIG. 12 is a flowchart illustrating the determination of the 756, the system and method are applicable to any process 

total uncaptured utilization \J UC . In step 738 the measure- ^ service time distribution. Nevertheless, a discussion of sev- 

ment interval L is determined. The steps thereafter are eral exemplary distributions follows, 

performed for measurements within the interval L. In step piG. 14 is a flowchart illustrating the determination of the 

740 one or more process service time distributions are portion of the total uncaptured utilization for an exponential 

determined, wherein the quantity of distributions is labeled distribution. In step 760 the process service time distribution 

d. A process service time distribution is a statistical distn- ^ determ i ne d to be an exponential distribution with service 

bution which determines the duration of one or more pro- ^ rate x In st ?62 me u of seen ^ which 

cesses. In step 742 , the quantity n cp of seen processes which follow ^ nential ^ iribuii l n with rate X is 

follow each distribution is determined. In other words, in , , . , , . t . . . - . - c 

* iAt\ j ia>* *u j- -j j • * j determined. In step 764 the mean residual time r for the 

steps 740 and 742 the seen processes are divided into d 4 . , ,. , ., r A . . . A . „ . o , 

,™,«c „ru««4« ~™,« n,,* „™ exponential distnbution is determined according to the fol- 

groups, wherein each group represents processes that are r . fa 

characterized by the same process service time distribution. 20 iowin S equation. 

In step 744 a mean residual time is determined for each 

process service time distribution j. The mean residual time r _ I _U + IL-^, 

r y - represents the average expected difference d ( -s /m( for all A I Aj 
the processes which are characterized by the same process 

service time distribution j. In other words, the mean residual 25 wherein A is the sample interval. 

time r, represents the average unsampled length related to pi G 15 ^ a fl owc hart illustrating the determination of the 

the unsampled segment 614. portion of the total uncaptured utilization for a uniform 

In step 746 the total uncaptured utilization U MC is deter- distribution. In step 780 the process service time distribution 

mined according to the following equation: ^ determined to be a uniform distribution between zero and 

30 a constant C. In step 782 the quantity a cp of seen processes 

* which follow the uniform distribution between zero and C is 

Zi rjrtcpJ determined. In step 784 the mean residual time r for the 

uniform distribution is determined according to the follow- 
ing equation: 



35 

wherein d is the number of process service time mii<c-;, A) 

distributions, r- is the mean residual time for each distribu- r= 2 ' 

tion j, n cpj - is the number of seen processes for each distri- 
bution i, and Lis the measurement interval. In other words, i • . • i . .i 1 i* 

,u * * I *-• -j **f tt • *u f4 i j , wherein t is the average difference between the last sampling 

the total uncaptured utilization U,^ is the sum of the products 40 . , r 

of the mean residual time and the number of seen processes U , m ' and the N^ning «™« 0*- the s*mpledlength 

for each distribution, all divided by the measurement inter- d the Jf™ 1 * normal^cd to 1 for the seen 

, T r . , J . processes which follow the uniform distnbution between 

val. If there is only one process service tune distribution, r , ^ , t , , n^*<ro j u • * * , L 

. . . \ , r - tt , zero and C, and thus wherein 0=t=C, and wherein A is the 

however, then the total uncaptured utilization U uc can be /-. it, ~ . f Z 

A t , j. • i •{* j 4 . uc ie sample interval. The mean residual time for the uniform 

determined according to a simplified equation: 45 t . . , . , . . 

& distnbution depends upon the sampled process time: the 

more time that has already been captured, the less the 

= — , expected residual time. 

L FIG. 16 is a flowchart illustrating the determination of the 

5Q portion of the total uncaptured utilization for an unknown 

wherein r is the mean residual time for the distribution, n cp distribution. In step 800 the process service time distribution 

is the number of seen processes for the distribution, and L is is determined to be an unknown distribution. In step 802 the 

the measurement interval. quantity n cp of seen processes which follow the unknown 

FIG. 13 illustrates the general determination of the total distribution is determined. In step 804 the mean residual 

uncaptured utilization \J UC for any process service time 5J time r for the unknown distribution is determined according 

distribution. In step 750 the process service time distribution to the following equation: 

is determined. In step 752 the quantity of seen processes 

which follow this distribution is determined. « 

In step 754 the conditional probability Gfa) of residual X "^^t 0 '^ 

time R^r, given that the process time X>t, is determined as *"/>o 

follows: 60 > ; 

2 l 
;=i, 

P(r<xzr + r) v *u >(> 
t\x > r) 

65 wherein n is the total quantity of processes, s l7 is the first 

wherein t is the last sample time and r is the unsampled sampling time for each process i, s^O for an unseen process, 

segment length. and b ; is a beginning time for each process i. The rationale 
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for the equation is that the residual time mirrors s ir b ( , the them will be seen. Because the percentage of short-lived, 

process time prior to the first sample time. seen processes can be estimated as described below, the total 

FIG. 17 is a flowchart illustrating an alternative method of number of unseen processes can be estimated as well, 

the determination of the portion of the total uncaptured The iterative method for determining the number of 

utilization for an unknown distribution. In step 820 the s unseen processes will function for any sample interval. For 

process service time distribution is determined to be an simplicity of computation, however, let the sample interval 

unknown distribution. In step 822 the quantity n c of seen A- 1. Application of the iterative method to a sample interval 

processes which follow the unknown distribution's deter- A ** wiU be readilv a PP a rent to one skilled in the art. First, 

mined. In step 824 the mean residual time ? for the unknown the P ro ? es s es are P| aced pto a quantity of buckets. In 

distribution is determined according to the following equa- 10 a coniputer-implemented version of this method a bucket 

tion . would preferably be a memory location, and the plurality of 

buckets would preferably be an ordered plurality of memory 
locations such as a matrix, one-dimensional array, linked 

X to' list, or other ordered data structure. The quantity of buckets 

r = — , is mxn, wherein n is the maximum number of times that any 

n *> 15 process has been sampled or hit, and wherein m is an 

arbitrary multiple of n. Preferably, m should be chosen such 

wherein CP is a set of all seen processes which follow the mat there wil1 be > 0Q avera S e > at least 20 processes in each 

unknown distribution, s u is the first sampling time for each bucket. ^ m buckets are divided evenly into n groups, as 

seen process ieCP, and b ( - is the beginning time for each on lUustrated b y FIG - 19 - 

process ieCP. The rationale for the equation is that the A process is placed in one of the buckets in the tth row, 

residual time mirrors s, r h , the process time prior to the first wherein t is the number of times the process was sampled or 

sample time nit * Within cacn row » mc buckets represent 

FIG. 18 is a flowchart illustrating the determination of the 

total unseen utilization. In step 840 a total captured utiliza- 25 - 

tion U c is determined. The total captured utilization U c is the n 
sum of the sampled lengths of all seen processes over the 

measurement interval L and can be computed as follows: segments of the sample interval A. A process sampled t times 

and starting within the ith segment of the sample interval is 

£ {Sfo. ~bi) 30 P^ aced tne bucket labeled with the following value: 



(f- 1)— +/, wherein i = 0, 1, ... , 1. 



wherein CP is a set of all seen processes, S^. is the last 

sampling time for each seen process ieCP,b t is the beginning 35 For example, let the maximum observed time for a 

time for each seen process ieCP, and L is the measurement process be e^-d. Because e mar <A, no process can be seen 

interval. more than once, and thus n»l. A derivative assumption is 

In step 842 a total measured utilization U m is determined. tbat no process lives longer than e max ; otherwise, a longer- 

The total measured utilization represents the total utilization lived process probably would have been seen. FIG. 20 

of all processes of interest, seen and unseen, over the 40 illustrates this example with m buckets labeled B 0 through 

measurement interval L. Step 842 assumes that universal B m-i- Thus, lne last non-empty bucket will be. bucket B /( 

utilization statistics are available from the registry of where i is the smallest integer that is greater than or equal to 

metrics, system software, or other monitoring software. In e mox xm -l> i- e > i is tDe ceiling of e^xm, denoted by 

step 844 the total unseen utilization V us is determined [<w xm l- In otner words, e ma jn-l^i<e maj pi. As shown in 

according to the following equation: 45 FIG. 20, all of the processes in bucket B,- will have length 



between 



In other words, the uncertainty is the difference between }_ 
the "actual" utilization U m and the "sampled" utilization U c . m 
The uncertainty (i.e., U m -U c ) is the sum of the uncaptured 50 
utilization U MC and the unseen utilization U^, so the unseen an d e 

utilization U us can be determined once the uncaptured Lefbucket B, have a quantity of processes f, of length 
utilization U uc and the uncertainty are known. As discussed between 
above, FIG. 12 illustrates how the uncaptured utilization U wc 
can be computed in one embodiment. 55 

If, however, universal utilization statistics are not ^ 
available, then the total measured utilization U m cannot 
easily be determined. Nevertheless, the total unseen utiliza- 
tion may still be determined according to an iterative a L nd e — ■ ^ assum P l J° n of uniformity of start time implies 
method which is described as follows. The following 60 lhat ^ P roccsses of lcn e th between 
method is also useful when the processes of interest do not 
represent all the activity on the computer system. If so, it is L 
assumed that the processes are marked in such a way that it m 
can be determined which ones are of interest. Furthermore, 

it is assumed in all cases that the start time of processes is 65 and e max will not be seen at all because they began and 
independent of the sample time. Therefore, if there are very terminated too early, and that others of the same length will 
many short-lived processes, then a certain percentage of have been placed in other buckets because they started 
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closer to the sampling time and continued to live afterwards. 
In fact, for any a.<e ma & the number of processes that start 
between a and 



20 



and 



after a sample is taken and have length between 



10 



is t^-f^m, and of that quantity, t imml (m-i) are unseen. In 
this way, the number of processes in the remaining buckets 
can be calculated by calculating one bucket at a time and 
then proceeding to next bucket (i.e., the bucket with the next 
shortest processes). 

In general, to compute the quantity of processes of length 
between 



and e max will be approximately f,- as well. Therefore, the total ;'- 1 

number of processes of length between 15 m 



and e max is approximately: 



and 



rfn 



25 



which is approximately mf ( -. Of course, those processes of 
the same length that started early in the collection cycle will 
be unseen. Their number can be estimated as: 



, = Ml - 



1 — £max 



which is approximately m(l-e max )f l . In the next bucket, 
Bt_ l9 all processes have length at least 



look at the quantity in the bucket labeled (j-1). Subtract 
from this quantity the estimates of the longer processes 
(previously calculated) that landed in this bucket. This new 
quantity is f y-1 . Multiply f / _ 1 by m, since all m buckets in the 
sample interval A are equally likely to have the processes of 
this length. Of this number, the fraction that is unseen is 



30 



Therefore, the total number of unseen processes of length 
between 



35 



y-i 



and 



The initial quantity of processes in this bucket are counted. 
Then subtract from the initial quantity 



(l-£H--i=(m-/)/>-.. 



or the estimated quantity of these processes that have length 
greater than 



m 



The remaining processes of whose quantity we will desig- 
nate f^ have length between 

and — . 

tn m 



The total number of processes of length between 

i-i 
m 



In this way, the number of processes of a given length less 
than the sample interval can be calculated. As shown in FIG. 
21 and as described in detail above, in step 860 create mxn 
buckets. In step 862 place each of the seen processes into the 
appropriate bucket, as described above. In step 864 start at 
the highest-ranked bucket: that is, the bucket with the 
longest processes. In step 866 count the number of processes 
in the current bucket (in the first case, the highest-ranked 
bucket). In step 868, as described in detail above, subtract 
from the count the fraction of the processes that were 
previously counted for higher-ranked buckets (when looking 
at the highest-ranked bucket, subtract zero processes). In 
step 870, multiply the difference by m, the number of 
buckets per sample interval A. In step 872, estimate the 
fraction of the product of step 870 which are unseen pro- 
cesses. In step 874, decide whether this is the lowest-ranked 
bucket. If it is the lowest- ranked bucket, then we stop. If it 
is not the lowest-ranked bucket, then in step 876 look at the 
bucket of the next lower rank and go through the process 



55 



60 
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again, starting at step 866. With this iterative technique, the 
number of unseen processes can be estimated for any 
number of segments of the sample interval A. 

If we do not assume that the longest process is shorter 
than the sample interval A, then buckets representing pro- 
cesses seen only once may actually contain processes of 
length greater than A. However, by looking at buckets 
representing processes that were seen more than once, we 
can iteratively estimate the number of processes in each 
bucket that represent processes of minimal possible length 
by subtracting an estimate of the number of processes in the 
bucket that had longer length. In other words, the technique 
described above also encompasses n>l, where n is the 
maximum number of times that any process is sampled. 

The iterative technique described above is useful even 
when the total measured utilization U m is known. With this 
technique, the length distribution of unseen processes can be 
determined. The distribution should be proportional to the 
number of unseen processes in each bucket. Let V us(i) be the 
utilization of unseen processes of length between 



and e max , and let be the utilization of unseen processes 25 
of length between 



30 



and 



35 



wherein j=0,l,2, . . . , i-1. The utilization for the unseen 
processes can be distributed as: 



'id - 



J 



fiii-e m )+ £(m -(* + !))/* 



-Uuj t and 



(m -(/ »!))/; 



fid 



k=Q 



(* + !))/* 



wherein j =0,1, 2, . . . , i-1. The two equations are illustrated 
in FIGS. 22 and 23, respectively. Id other words, the length 
distribution of unseen processes is determined by multiply- 
ing the total unseen utilization U wa . by a coefficient, wherein 
the coefficient and is derived from the iterative method, 

In one embodiment, the enterprise is modeled and/or its 
configuration is altered in response to the determination^) 
of utilization described herein. Modeling according to one 
embodiment is discussed in detail with reference to FIGS. 6 
and 7. In various embodiments, this modeling may further 
comprise one of more of the following: displaying the 
determination^) to a user, predicting future performance, 
graphing a performance prediction, generating reports, ask- 
ing a user for further data, and permitting a user to modify 
a model of the enterprise. In one embodiment, Analyze 406 
and/or Predict 408, as discussed in detail with reference to 
FIGS. 6 and 7, implement the modeling, analysis, and/or 
prediction in response to the determination^) of utilization. 
In one embodiment, a configuration of the enterprise is 
altered in response to the determination^) of utilization. 



22 



Altering a configuration of the enterprise may comprise, for 
example, reconfiguring a network topology or installing 
additional resources, such as CPUs, software, memory 
resources, or network routers or hubs. 

5 Although the system and method of the present invention 
have been described in connection with several 
embodiments, the invention is not intended to be limited to 
the specific forms set forth herein, but on the contrary, it is 
intended to cover such alternatives, modifications, and 

10 equivalents as can be reasonably included within the spirit 
and scope of the invention as defined by the appended 
claims. 
What is claimed is: 

1. A method for monitoring the state of a computer 
15 system, the method comprising: 

collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
one or more processes on the computer system; 
storing the set of raw data points in a memory; 
20 sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 
cesses which are included in the set of sampled data 
points are seen processes and processes which are not 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 
for each seen process; 
statistically estimating a total uncaptured utilization U MC , 
wherein the total uncaptured utilization is an estimation 
of a total length of unsampled segments for the seen 
processes of the one or more processes over the mea- 
surement interval; 
statistically estimating a total unseen utilization U^, 
wherein the total unseen utilization is an estimation of 
a total length of the unseen processes of the one or more 
processes over the measurement interval. 

2. The method of claim 1, 

wherein the statistically estimating a total uncaptured 
utilization U uc further comprises: 
determining a process service time distribution, 
wherein the process service time distribution esti- 
mates a duration of one or more processes; 
determining a quantity a cp of seen processes which 

follow the process service time distribution; 
determining a mean residual time r for the process 
service time distribution, wherein the mean residual 
time estimates a length of an uncaptured residual 
segment for each seen process; 
determining the total uncaptured utilization U ua accord - 
50 ing to the following equation: 

u =^ 

L ' 

55 

wherein L is the measurement interval. 

3. The method of claim 2, 

wherein the determining a mean residual time r further 
comprises: 

60 determining a conditional probability function G/r) for 
the process service time distribution, wherein G,(r) is 
a conditional probability that a residual time R^r, 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 

65 sampling time and a process ending time, and 

wherein G/r) is determined according to the follow- 
ing equation: 



45 
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wherein the determining a mean residual time r further 
rr^.wp^^ivvA P(r<*ir + r) comprises determining the mean residual time r accord- 

P(X > o ing to the following equation: 



determining the mean residual time r according to the 5 £ n^o, _£,.)] 

following equation: »=i, 



10 



Vj |7 >0 



£ 1 

(=1. 

V*:/>0 



wherein A is the sample interval. 
4. The method of claim 2, 

wherein the determining a process service time distribu- 15 
tion comprises determining that the process service 
time distribution is an exponential distribution with a 
service rate X; 

wherein the determining a quantity n cp of seen processes 
comprises determining a quantity n^ of seen processes 20 
which follow the exponential distribution with the 
service rate X; 

wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 25 



wherein n is a total quantity of processes, s l7 is the first 
sampling time for each process i, and b, is a beginning 
time for each process i. 

7. The method of claim 2, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is an unknown distribution; 

wherein the determining a quantity n cp of seen processes 
comprises determining a quantity n cp of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b ( - for each seen process; 

wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 



wherein A is the sample interval. 
5. The method of claim 2, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is a uniform distribution between zero 35 
and a constant C; 

wherein the determining a quantity n cp of seen processes 
comprises determining a quantity n cp of seen processes 
which follow the uniform distribution between zero and 
C; 40 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time for each seen process; 

wherein the determining a mean residual time r further 45 
comprises determining the mean residual time r accord- 
ing to the following equation: 



wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
processes which follow the uniform distribution 
between zero and C, wherein 0~t=C, and wherein A is 55 
the sample interval. 

6. The method of claim 2, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 6Q 
time distribution is an unknown distribution; 

wherein the determining a quantity n cp of seen processes 
comprises determining a quantity of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 65 
the set of sampled data points includes the beginning 
time by for each seen process; 



wherein CP is a set of all seen processes which follow the 
unknown distribution and & a is the first sampling time 
for each seen process ieCP. 

8. The method of claim 1, 

wherein the statistically estimating a total uncaptured 

utilization V uc further comprises: 

determining a plurality d of process service time 
distributions, wherein each process service time dis- 
tribution j estimates a duration of one or more 
processes, wherein l=j=d; 

for each process service time distribution j, determining 
a quantity n cp/ of seen processes which follow that 
process service time distribution j; 

for each process service time distribution j, determining 
a mean residual time t } - for that process service time 
distribution j, wherein the mean residual time esti- 
mates a length of an uncaptured residual segment for 
each seen process which follows that process service 
time distribution j; 

determining the total uncaptured utilization U HC accord- 
ing to the following equation: 

d 

wherein L is the measurement interval. 

9. The method of claim 8, 

wherein the determining a mean residual time r ; - for each 
process service time distribution further comprises: 
determining a conditional probability function G/r) for 
each process service time distribution, wherein Gj(r) 
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is a conditional probability that a residual time R^r, 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 
sampling time and a process ending time, and 
wherein G,(r) is determined according to the follow- 
ing equation: 



G t {r)=P[R*r)\X>t) 



P{i<xst + r) 
f\X>t) : 



10 



determining the mean residual time r ; - according to the 
following equation: 



rdG,{rX 



wherein A is the sample interval. 
10. The method of claim 8, 

wherein the determining a plurality of process service 
time distributions further comprises determining that 
one of the process service time distributions is an 
exponential distribution with a service rate 

wherein the determining a quantity n cp} of seen processes 
for each process service time distribution further com- 
prises determining a quantity n CJ3J of seen processes 
which follow the exponential distribution with th e 
service rate X; 

wherein the determining a mean residual time r y - for each 
process service time distribution further comprises 
determining the mean residual time tj for the exponen- 
tial distribution with the service rate X according to the 
following equation: 



20 



25 



30 



wherein A is the sample interval. 
11. The method of claim 8, 

wherein the determining a plurality of process service 
time distributions further comprises determining that 
one of the process service time distributions is a 
uniform distribution between zero and a constant C; 

wherein the determining a quantity n cpf of seen processes 
for each process service time distribution further com- 
prises determining a quantity n cpj of seen processes 
which follow the uniform distribution between zero and 
C; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time for each seen process; 

wherein the determining a mean residual time Ty for each 
process service time distribution further comprises 
determining the mean residual time r ; - for the uniform 
distribution between zero and C according to the fol- 
lowing equation: 

min(C - /, A) 



40 



45 



50 



55 



60 



wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
processes which follow the uniform distribution 
between zero and C, wherein O^t^C, and wherein A is 
the sample interval. 



65 
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12. The method of claim 8, 

wherein the determining a plurality of process service 
time distributions further comprises determining that 
one of the process service time distributions is an 
unknown distribution; 

wherein the determining a quantity n^- of seen processes 
for each process service time distribution further com- 
prises determining a quantity n cpf - of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b,- for each seen process; 

wherein the determining a mean residual time r ; - for each 
process service time distribution further comprises 
determining the mean residual time r ; - for the unknown 
distribution according to the following equation: 

n 

^ max[0, (s;j - bi)] 
v£>o 

f i n > 

Z 1 



sampling time for each process i, and b y is a beginning 
time for each process i, and wherein s i7 =0 for each 
unseen process and s (7 >0 for each seen process. 
13. The method of claim 8, 

wherein the determining a plurality of process service 
time distributions further comprises determining that 
one of the process service time distributions is an 
unknown distribution; 

wherein the determining a quantity n^j of seen processes 
for each process service time distribution further com- 
prises determining a quantity n c/v - of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b t - for each seen process; 

wherein the determining a mean residual time r ; for each 
process service time distribution further comprises 
determining the mean residual time r ; . for the unknown 
distribution according to the following equation: 



wherein CP is a set of all seen processes which follow the 
unknown distribution and s (7 is the first sampling time 
for each seen process ieCP. 

14. The method of claim 1, further comprising: 

determining a total captured utilization U c , wherein the 
total captured utilization measures a total length of 
sampled segments for the one or more seen processes 
over the measurement interval; 

determining a total measured utilization U m , wherein the 
total measured utilization U m measures a total length of 
all of the one or more processes over the measurement 
interval; 
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wherein the statistically estimating a total unseen utiliza- 
tion further comprises determining the total unseen 
utilization U us according to the following equation: 

U„-U m -U c -U uc3 5 

wherein U wc is the total uncap tured utilization. 
15. The method of claim 14, 

wherein each process has a beginning time, and wherein 10 
the set of sampled data points includes the beginning 
time b ( - for each seen process; 

wherein the determining a total captured utilization U c 
further comprises determining U e according to the 
following equation: 15 



20 



(1-1)-+/. 



equal divisions of the sample interval A such that 

m 

1=0, 1 
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wherein CP is a set of all seen processes, s /w . is the last 
sampling time for each seen process ieCP, b £ is the 
beginning time for each seen process ieCP, and L is the 
measurement interval. 25 

16. The method of claim 1, 

wherein the statistically estimating a total unseen utiliza- 
tion further comprises: 
creating a plurality of buckets; 
placing each seen process into one of the plurality of 
buckets; 

estimating a total quantity of unseen processes for each 
of a plurality of equal length segments of the sample 
interval A, wherein each segment corresponds to a 35 
bucket. 

17. The method of claim 1, 

wherein the statistically estimating a total unseen utiliza- 
tion U^j further comprises: 

creating a plurality of buckets with m rows and n 40 
columns, wherein n is a maximum number of 
samples in the set of sampled data points for any 
particular process, wherein m is a multiple of n, and 
wherein the buckets are ordered from zero to m-1; 

placing each seen process into one of the plurality of 45 
buckets, wherein the bucket is labeled according to 
the following equation: 



50 



wherein t is a total quantity of samples in the set of 
sampled data points for this process, wherein i indi- 
cates one of 55 



60 



estimating a total quantity of unseen processes for each 
of 



65 
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length segments of the sample interval A, comprising: 

counting a total quantity f, of processes of the greatest 
length segment contained in the highest-numbered 
bucket which contains at least one process; 
multiplying f ( - by m; 

determining a fraction of mxf, which are unseen pro- 
cesses; 

iteratively estimating a total quantity of unseen pro- 
cesses for each lesser length segment of the sample 
interval A, comprising: 

counting an initial quantity of processes of the next 
lesser length segment contained in the next lower- 
numbered bucket; 

calculating a difference of the initial quantity and a 
fraction of previously calculated longer processes; 

calculating a product of the difference and m; 

determining a fraction of the product which are 
unseen processes. 

18. The method of claim 1, further comprising: 
determining a length distribution of the unseen processes 

of a greatest length, comprising multiplying the total 

unseen utilization \J US by a first coefficient; 
determining a length distribution of the unseen processes 

of a lesser length, comprising multiplying the total 

unseen utilization XJ US by a second coefficient; 
wherein the first coefficient, and second coefficient are 

derived from an iterative method, wherein the iterative 

method comprises: 

creating a plurality of buckets; 

placing each seen process into one of the plurality of 
buckets; 

estimating a total quantity of unseen processes for each 
of a plurality of equal length segments of the sample 
interval A, wherein each length segment corresponds 
to a bucket. 

19. The method of claim 1, 

wherein the memory is a registry of metrics. 

20. The method of claim 1, 

wherein the collecting a set of raw data points, the storing 
the set of raw data points in a memory, and sampling 
the memory are performed continually and repetitively 
over the measurement interval. 

21. The method of claim 1, 

wherein the collecting a set of raw data points is per- 
formed a plurality of times at a collecting frequency; 

wherein the sampling the memory is performed a plurality 
of times at a sampling frequency; 

wherein the sampling frequency is less than the collecting 
frequency. 

22. The method of claim 1, 

wherein the collecting a set of raw data points, the storing 
the set of raw data points in a memory, the sampling the 
memory, the statistically estimating a total uncaptured 
utilization U wc , and the statistically estimating a total 
unseen utilization are performed on a single com- 
puter system. 

23. The method of claim 1, 

wherein the collecting a set of raw data points is per- 
formed on a different computer system than the statis- 
tically estimating a total uncaptured utilization U MC and 
the statistically estimating a total unseen utilization 
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24. The method of claim 1, further comprising: 
modifying a model of the computer system based on the 

statistically estimating a total uncaptured utilization 
V uc and the statistically estimating a total unseen 
utilization U HJ . 5 

25. The method of claim 1, further comprising: 
altering a configuration of the computer system based on 

the statistically estimating a total uncaptured utilization 
\J UC and the statistically estimating a total unseen 
utilization U us . 10 

26. A method for monitoring the state of a computer 
system, the method comprising: 

collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
one or more processes on the computer system; 15 

storing the set of raw data points in a memory; 

sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 
cesses which are included in the set of sampled data 
points are seen processes and processes which are not 20 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 
for each seen process; 

statistically estimating a total uncaptured utilization U uc , 25 
wherein the total uncaptured utilization is an estimation 
of a total length of unsampled segments for the seen 
processes of the one or more processes over the mea- 
surement interval, comprising: 

determining a plurality d of process service time 30 
distributions, wherein each process service time distri- 
bution j estimates a duration of one or more processes, 
wherein 1 =j§d; 

for each process service time distribution j, determining a 
quantity n^- of seen processes which follow that pro- 35 
cess service time distribution j; 

for each process service time distribution j, determining a 
mean residual time r y - for that process service time 
distribution], wherein the mean residual time estimates 
a length of an uncaptured residual segment for each 40 
seen process which follows that process service time 
distribution j; 

determining the total uncaptured utilization U uc according 
to the following equation: 

4 



statistically estimating a total unseen utilization \J US , 
wherein the total unseen utilization is an estimation of 
a total length of the unseen processes of the one or more 
processes over the measurement interval; 5S 

modifying a model of the computer system based on the 
statistically estimating a total uncaptured utilization 
U ttC and the statistically estimating a total unseen 
utilization \J US . 

27. A method for monitoring the state of a computer 60 
system, the method comprising: 

collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
one or more processes on the computer system; 

storing the set of raw data points in a memory; 65 

sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 
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cesses which are included in the set of sampled data 
points are seen processes and processes which are not 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 
for each seen process; 

statistically estimating a total uncaptured utilization \J uct 
wherein the total uncaptured utilization is an estimation 
of a total length of unsampled segments for the seen 
processes over the measurement interval; 

statistically estimating a total unseen utilization U^, 
wherein the total unseen utilization is an estimation of 
a total length of the unseen processes over the mea- 
surement interval, comprising: 
creating a plurality of buckets; 
placing each seen process into one of the plurality of 
buckets; 

estimating a total quantity of unseen processes for each 
of a plurality of equal length segments of the sample 
interval A, wherein each segment corresponds to a 
bucket; 

modifying a model of the computer system based on the 
statistically estimating a total uncaptured utilization 
U uc and the statistically estimating a total unseen 
utilization U us . 

28. A system for monitoring the state of a computer 
system, the system comprising: 

a CPU; 

a system memory coupled to the CPU, wherein the system 
memory stores one or more computer programs execut- 
able by the CPU; 
wherein the computer programs are executable to: 
collect a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
a set of processes on the computer system; 
store the set of raw data points in a memory; 
sample the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein 
processes which are included in the set of sampled 
data points are seen processes and processes which 
are not included in the set of sampled data points are 
unseen processes, and wherein the set of sampled 
data points includes a first sampling time and a last 
sampling time for each seen process; 
statistically estimate a total uncaptured utilization U MC , 
wherein the total uncaptured utilization is an estima- 
tion of a total length of unsampled segments for the 
seen processes over the measurement interval; 
statistically estimate a total unseen utilization 
wherein the total unseen utilization is an estimation 
of a total length of the unseen processes over the 
measurement interval. 

29. The system of claim 28, 

wherein in statistically estimating the total uncaptured 
utilization U uc , the computer programs are executable 
to: 

determine a process service time distribution, wherein 
the process service time distribution estimates a 
duration of one or more processes; 

determine a quantity n cp of seen processes which 
follow the process service time distribution; 

determine a mean residual time r for the process service 
time distribution, wherein the mean residual time 
estimates a length of an uncaptured residual segment 
for each seen process; 

determine the total uncaptured utilization accord- 
ing to the following equation: 
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wherein L is the measurement interval. 

30. The system of claim 29, 

wherein in determining a mean residual time r, the com- 
puter programs are further executable to: 
determining a conditional probability function G ( (r) for 10 
the process service time distribution, wherein Gj(i) is 
a conditional probability that a residual time R^r, 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 15 
sampling time and a process ending time, and 
wherein G/r) is determined according to the follow- 
ing equation: 

C, W = P(^,)|X>,)=^1^; 

determining the mean residual time r according to the 
following equation: 2j 

r= £rdG t (r), 

wherein A is the sample interval. 30 

31. The system of claim 29, 

wherein in determining a process service time 
distribution, the computer programs are executable to 
determine that the process service time distribution is 35 
an exponential distribution with a service rate X; 

wherein in determining a quantity n cp of seen processes, 
the computer programs are executable to determine a 
quantity n cp of seen processes which follow the expo- 4Q 
nential distribution with the service rate X; 

wherein in determining a mean residual time r, the com- 
puter programs are executable to determine the mean 
residual time r according to the following equation: 

wherein A is the sample interval. 50 

32. The system of claim 29, 

wherein in determining a process service time 
distribution, the computer programs are executable to 
determine that the process service time distribution is a 5S 
uniform distribution between zero and a constant C; 

wherein in determining a quantity n cp of seen processes, 
the computer programs are executable to determine a 
quantity n cp of seen processes which follow the uni- 
form distribution between zero and C; 60 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time for each seen process; 

wherein in determining a mean residual time 7, the com- 65 
puter programs are executable to determine the mean 
residual time r according to the following equation: 



mir(C-/, A) 



wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
processes which follow the uniform distribution 
between zero and C, wherein O^t^C, and wherein A is 
the sample interval. 

33. The system of claim 29, 

wherein in determining a process service time 
distribution, the computer programs are executable to 
determine that the process service time distribution is 
an unknown distribution; 

wherein in determining a quantity n cp of seen processes, 
the computer programs are executable to determine a 
quantity n^ of seen processes which follow the 
unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b t for each seen process; 

wherein in determining a mean residual time r, the com- 
puter programs are executable to determine the mean 
residual time r according to the following equation: 

£ maxtO.U;,-*;)] 
;=i. 

£ 1 

v* j7 >o 

wherein n is a total quantity of processes, s j7 is the first 
sampling time for each process i, and b,- is a beginning 
time for each process i, 

34. The system of claim 29, 

wherein in determining a process service time 
distribution, the computer programs are executable to 
determine that the process service time distribution is 
an unknown distribution; 

wherein in determining a quantity n cp of seen processes, 
the computer programs are executable to determine a 
quantity n cp of seen processes which follow the 
unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b ( - for each seen process; 

wherein in determining a mean residual time r, the com- 
puter programs are executable to determine the mean 
residual time r according to the following equation: 

itCP 

r = , 

wherein CP is a set of all seen processes which follow the 
unknown distribution and s i7 is the first sampling time 
for each seen process icCR 

35. The system of claim 28, 

wherein in statistically estimating a total uncaptured uti- 
lization V uc> the computer programs are executable to: 
determine a set d of process service time distributions, 
wherein each process service time distribution j 
estimates a duration of one or more processes, 
wherein l^j^d; 
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for each process service time distribution j, determine 
a quantity n • of seen processes which follow that 
process service time distribution j; 

for each process servicetime distribution j, determine 
a mean residual time r ; - for that process service time 
distribution j, wherein the mean residual time esti- 
mates a length of an uocaptured residual segment for 
each seen process which follows that process service 
time distribution j; 

determine the total uncaptured utilization U wc accord- 
ing to the following equation: 



?jttcp J 
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determine that one of the process service time distri- 
butions is a uniform distribution between zero and a 
constant C; 

wherein in determining a quantity n - of seen processes 
for each process service time distribution, the computer 
programs are executable to determine a quantity n cpJ of 
seen processes which follow the uniform distribution 
between zero and C; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time for each seen process; 

wherein in determining a mean residual time r, for each 
process service time distribution, the computer pro- 
grams are executable to determine the mean residual 
time Vj for the uniform distribution between zero and C 
according to the following equation: 



wherein L is the measurement interval. 
36. The system of claim 35, 

wherein in determining a mean residual time r ; . for each 
process service time distribution, the computer pro- 
grams are executable to: 

determine a conditional probability function G f (r) for 
each process service time distribution, wherein Gj(r) 
is a conditional probability that a residual time R^r, 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 
sampling time and a process ending time, and 
wherein G,(r) is determined according to the follow- 
ing equation: 

f\t < x 5 r + r) 

determine the mean residual time r ; - according to the fol- 
lowing equation: 



wherein A is the sample interval. 
37. The system of claim 35, 

wherein in determining a set of process service time 
distributions, the computer programs are executable to 45 
determine that one of the process service time distri- 
butions is an exponential distribution with a service rate 

h 

wherein in determining a quantity n cpJ of seen processes 
for each process service time distribution, the computer 50 
programs are executable to determine a quantity n cp} - of 
seen processes which follow the exponential distribu- 
tion with the service rate X; 

wherein in determining a mean residual time r ; - for each 
process service time distribution, the computer pro- ss 
grams are executable to determine the mean residual 
time ij- for the exponential distribution with the service 
rate X according to the following equation: 




wherein A is the sample interval. 

38. The system of claim 35, 65 
wherein in determining a set of process service time 
distributions, the computer programs are executable to 



min{C - r, A) 
?j 2 ' 

wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
processes which follow the uniform distribution 
between zero and C, wherein O^t^C, and wherein A is 
the sample interval. 

39. The system of claim 35, 

wherein in determining a set of process service time 
distributions, the computer programs are executable to 
determine that one of the process service time distri- 
butions is an unknown distribution; 

wherein in determining a quantity n cpf - of seen processes 
for each process service time distribution, the computer 
programs are executable to determine a quantity n j of 
seen processes which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b,. for each seen process; 

wherein in determining a mean residual time x { for each 
process service time distribution, the computer pro- 
grams are executable to determine the mean residual 
time r ; - for the unknown distribution according to the 
following equation: 

£ max[0, {s u ~bi)] 

I 1 

;=i, 

wherein n is a total quantity of processes, s t -; is the first 
sampling time for each process i, and b, is a beginning 
time for each process i, and wherein s l7 =0 for each 
unseen process and s i7 >0 for each seen process. 

40. The system of claim 35, 

wherein in determining a set of process service time 
distributions, the computer programs are executable to 
determine that one of the process service time distri- 
butions is an unknown distribution; 

wherein in determining a quantity n cpf of seen processes 
for each process service time distribution, the computer 
programs are executable to determine a quantity n €p/ of 
seen processes which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b, for each seen process; 
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wherein in determining a mean residual time ry for each 
process service time distribution, the computer pro- 
grams are executable to determine the mean residual 
time x- for the unknown distribution according to the 
following equation: 5 

£ [su -hi) 



wherein CP is a set of all seen processes which follow the 
unknown distribution and s t7 is the first sampling time 
for each seen process ieCP. 

41. The system of claim 28, 15 
wherein the computer programs are further executable to: 

determine a total captured utilization U c , wherein the 
total captured utilization measures a total length of 
sampled segments for one or more seen processes of 
over the measurement interval; 20 

determine a total measured utilization \J m , wherein the 
total measured utilization U m measures a total length 
of all of the one or more processes over the mea- 
surement interval; 

wherein in statistically estimating a total unseen utili- is 
zation U^, the computer programs are executable to 
determine the total unseen utilization according 
to the following equation: 

U^U m -U-U uc , 3Q 

wherein U uc is the total uncaptured utilization. 

42. The system of claim 41, 

wherein each process has a beginning time, and wherein 

the set of sampled data points includes the beginning 35 

time b,- for each seen process; 
wherein in determining a total captured utilization U c , the 

computer programs are executable to determine U c 

according to the following equation: 

40 



Uc**- 



place each seen process into one of the plurality of 
buckets, wherein the bucket is labeled according to 
the following equation: 

m 

it -!)-+(. 
n 

wherein t is a total quantity of samples in the set of 
sampled data points for this process, wherein i indi- 
cates one of 



equal divisions of the sample interval A such that 



45 

wherein CP is a set of all seen processes, S im , is the last 
sampling time for each seen process ieCP, b ( - is the 
beginning time for each seen process ieCP, and Lis the 
measurement interval. 

43. The system of claim 28, 50 
wherein in statistically estimating a total unseen utiliza- 
tion U^, the computer programs are executable to: 
create a plurality of buckets; 

place each seen process into one of the plurality of 
buckets; 55 

estimate a total quantity of unseen processes for each of 
a plurality of equal length segments of the sample 
interval A, wherein each segment corresponds to a 
bucket. 

44. The system of claim 28, 60 
wherein in statistically estimating a total unseen utiliza- 
tion U^, the computer programs are executable to: 
create a plurality of buckets with m rows and n 

columns, wherein n is a maximum number of 
samples in the set of sampled data points for any 65 
particular process, wherein m is a multiple of n, and 
wherein the buckets are ordered from zero to m-1; 



f = o, 1 ■ 



estimate a total quantity of unseen processes for each of 
m/n length segments of the sample interval A, 
wherein the computer programs are executable to: 
count a total quantity f, of processes of the greatest 
length segment contained in the highest-numbered 
bucket which contains at least one process; 
multiply f, by m; 

determine a fraction of mxf,. which are unseen pro- 
cesses; 

iteratively estimate a total quantity of unseen pro- 
cesses for each lesser length segment of the 
sample interval A, wherein the computer programs 
are executable to: 

count an initial quantity of processes of the next 
lesser length segment contained in the next 
lower-numbered bucket; 

calculate a difference of the initial quantity and a 
fraction of previously calculated longer pro- 
cesses; 

calculate a product of the difference and m; 
determine a fraction of the product which are 
unseen processes. 

45. The system of claim 28, 

wherein the computer programs are further executable to: 
determine a length distribution of the unseen processes 
of a greatest length, comprising multiplying the total 
unseen utilization by a first coefficient; 
determine a length distribution of the unseen processes 
of a lesser length, comprising multiplying the total 
unseen utilization by a second coefficient; 
wherein the first coefficient and second coefficient are 
derived from an iterative method, wherein in per- 
forming the iterative method to determine the first 
coefficient and second coefficient, the computer pro- 
grams are executable to: 
create a plurality of buckets; 
place each seen process into one of the plurality of 
buckets; 

estimate a total quantity of unseen processes for each 
of a plurality of equal length segments of the 
sample interval A, wherein each length segment 
corresponds to a bucket. 

46. The system of claim 28, 

wherein the memory comprises a registry of metrics. 

47. The system of claim 28, 

wherein the computer programs are further executable to 
modify a model of the computer system based on the 
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statistically estimating a total uncaptured utilization 
U MC and the statistically estimating a total unseen 
utilization U us . 

48. The system of claim 28, 

wherein the computer programs are further executable to s 
alter a configuration of the computer system based on 
the statistically estimating a total uncaptured utilization 
U wc and the statistically estimating a total unseen 
utilization U WJ . 

49. A memory medium which stores program instructions 
for monitoring the state of a computer system, wherein the 
program instructions are executable to implement: 

collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to is 
a one or more processes on the computer system; 

storing the set of raw data points in a memory; 

sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 2Q 
cesses which are included in the set of sampled data 
points are seen processes and processes which are not 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 
for each seen process; 

statistically estimating a total uncaptured utilization U„ c , 
wherein the total uncaptured utilization is an estimation 
of a total length of unsampled segments for the seen 
processes over the measurement interval; 

statistically estimating a total unseen utilization U HJ , 
wherein the total unseen utilization is an estimation of 
a total length of the unseen processes over the mea- 
surement interval. 

50. The memory medium of claim 49, 
wherein the statistically estimating a total uncaptured 

utilization U HC further comprises: 

determining a process service time distribution, 
wherein the process service time distribution esti- 
mates a duration of one or more processes; 

determining a quantity n cp of seen processes which 
follow the process service time distribution; 

determining a mean residual time r for the process 
service time distribution, wherein the mean residual 
time estimates a length of an uncaptured residual 
segment for each seen process; and 

determining the total uncaptured utilization U MC accord- 
ing to the following equation: 



25 
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G,(r)=P(R*r)\X>Q = 



P(t<xZt + r) 



determining the mean residual time r according to the 
following equation: 



■r 



rdG,{r\ 



wherein A is the sample interval. 

52. The memory medium of claim 50, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is an exponential distribution with a 
service rate X; 

wherein the determining a quantity a cp of seen processes 
comprises determining a quantity n cp of seen processes 
which follow the exponential distribution with the 
service rate X; 

wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 



wherein A is the sample interval. 

53. The memory medium of claim 50, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is a uniform distribution between zero 
and a constant C; 

wherein the determining a quantity a cp of seen processes 
comprises determining a quantity n cp of seen processes 
which follow the uniform distribution between zero and 
C; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time for each seen process; 

wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 



50 




wherein L is the measurement interval. 55 
51. The memory medium of claim 50, 
wherein the determining a mean residual time r further 

comprises: 

determining a conditional probability function G,(r) for 6Q 
the process service time distribution, wherein G/r) is 
a conditional probability that a residual time R^r, 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 
sampling time and a process ending time, and 65 
wherein G/r) is determined according to the follow- 
ing equation: 



wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
processes which follow the uniform distribution 
between zero and C, wherein O^t^C, and wherein A is 
the sample interval. 

54. The memory medium of claim 50, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is an unknown distribution; 

wherein the determining a quantity n cp of seen processes 
comprises determining a quantity n cp of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b ( - for each seen process; 
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wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 



•=1, 

. 

n 

I 1 

1=1. 

Vj ( y>0 



wherein n is a total quantity of processes, s (7 is the first 
sampling time for each process i, and b t is a beginning 
time for each process i. 

55. The memory medium of claim 50, 

wherein the determining a process service time distribu- 
tion comprises determining that the process service 
time distribution is an unknown distribution; 

wherein the determining a quantity n cp of seen processes 
comprises determining a quantity n^ of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b ; for each seen process; 

wherein the determining a mean residual time r further 
comprises determining the mean residual time r accord- 
ing to the following equation: 



20 



25 
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wherein CP is a set of all seen processes which follow the 

unknown distribution and s a is the first sampling time 

for each seen process ieCP. 
56. The memory medium of claim 49, 
wherein the statistically estimating a total uncaptured 

utilization U wc further comprises: 

determining a set d of process service time 
distributions, wherein each process service time dis- 
tribution j estimates a duration of one or more 
processes, wherein l=j=d; 

for each process service time distribution j, determining 45 
a quantity n cpf - of seen processes which follow that 
process service time distribution j; 

for each process service time distribution j, determining 
a mean residual time r,- for that process service time 
distribution j, wherein the mean residual time esti- 
mates a length of an uncaptured residual segment for 
each seen process which follows that process service 
time distribution j; and 

determining the total uncaptured utilization U MC accord- 
ing to the following equation: 
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wherein L is the measurement interval. 

57. The memory medium of claim 56, 

wherein the determining a mean residual time r y for each 
process service time distribution further comprises: 65 
determining a conditional probability function G f (r) for 
each process service time distribution, wherein G/r) 
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is a conditional probability that a residual time R^r, 
given that a process time X>t, wherein t is the last 
sampling time and r is a difference between the last 
sampling time and a process ending time, and 
wherein G/r) is determined according to the follow- 
ing equation: 



G,(r) = P{R£r)\X>t): 



P(r<x £t + r) 

f\x>t) : 



determining the mean residual time r y according to the 
following equation: 



■r 



wherein A is the sample interval. 

58. The memory medium of claim 56, 

wherein the determining a set of process service time 
distributions further comprises determining that one of 
the process service time distributions is an exponential 
distribution with a service rate 

wherein the determining a quantity n cpj of seen processes 
for each process service time distribution further com- 
prises determining a quantity n cpj of seen processes 
which follow the exponential distribution with the 
service rate X; 

wherein the determining a mean residual time r, for each 
process service time distribution further comprises 
determining the mean residual time r ; - for the exponen- 
tial distribution with the service rate X according to the 
following equation: 



wherein A is the sample interval. 

59. The memory medium of claim 56, 

wherein the determining a set of process service time 
distributions further comprises determining that one of 
the process service time distributions is a uniform 
distribution between zero and a constant C; 

wherein the determining a quantity n cpj - of seen processes 
for each process service time distribution further com- 
prises determining a quantity n cpj of seen processes 
which follow the uniform distribution between zero and 
C; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time for each seen process; 

wherein the determining a mean residual time r- for each 
process service time distribution further comprises 
determining the mean residual time r ; - for the uniform 
distribution between zero and C according to the fol- 
lowing equation: 

min(C - 1, A) 



wherein t is the average difference between the last 
sampling time and the beginning time for the seen 
processes which follow the uniform distribution 
between zero and C, wherein O^t^C, and wherein A is 
the sample interval. 
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60. The memory medium of claim 56, 

wherein the determining a set of process service time 
distributions further comprises determining that one of 
the process service time distributions is an unknown 
distribution; 

wherein the determining a quantity n cpf of seen processes 
for each process service time distribution further com- 
prises determining a quantity a cpJ of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b ( - for each seen process; 

wherein the determining a mean residual time r y - for each 
process service time distribution further comprises 
determining the mean residual time r, for the unknown 
distribution according to the following equation: 

n 

£ nuo[0, (*,-&,•)] 
<-i. 
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Vj (/>0 
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wherein n is a total quantity of processes, s j7 is the first 
sampling time for each process i, and b ( - is a beginning 
time for each process i, and wherein s t7 =0 for each 
unseen process and s (7 >0 for each seen process. 

61. The memory medium of claim 56, 

wherein the determining a set of process service time 
distributions further comprises determining that one of 
the process service time distributions is an unknown 
distribution; 

wherein the determining a quantity n - of seen processes 
for each process service time distribution further com- 
prises determining a quantity n cpj of seen processes 
which follow the unknown distribution; 

wherein each process has a beginning time, and wherein 
the set of sampled data points includes the beginning 
time b ( for each seen process; 

wherein the determining a mean residual time r y for each 
process service time distribution further comprises 
determining the mean residual time r ; - for the unknown 
distribution according to the following equation: 
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wherein CP is a set of all seen processes which follow the 55 
unknown distribution and s (7 is the first sampling time 
for each seen process ieCP 
62. The memory medium of claim 49, 
wherein the program instructions further implement: 

determining a total captured utilization U c , wherein the 60 
total captured utilization measures a total length of 
sampled segments for the one or more seen processes 
over the measurement interval; and 
determining a total measured utilization U m , wherein 
the total measured utilization measures a total length 65 
of all of the one or more processes over the mea- 
surement interval; 
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wherein the statistically estimating a total unseen uti- 
lization further comprises determining the total 
unseen utilization U MJ according to the following 
equation: 

wherein U MC is the total uncap tuned utilization. 
63. The memory medium of claim 62, 
wherein each process has a beginning time, and wherein 

the set of sampled data points includes the beginning 

time b,- for each seen process; 
wherein the determining a total captured utilization U c 

further comprises determining U c according to the 

following equation: 



wherein CP is a set of all seen processes, s^. is the last 
sampling time for each seen process ieC£, b ( - is the 
beginning time for each seen process ieCP, and L is the 
measurement interval. 

64. The memory medium of claim 49, 

wherein the statistically estimating a total unseen utiliza- 
tion \J US further comprises: 
creating a plurality of buckets; 
placing each seen process into one of the plurality of 
buckets; and 

estimating a total quantity of unseen processes for each 
of a plurality of equal length segments of the sample 
interval A, wherein each segment corresponds to a 
bucket. 

65. The memory medium of claim 49, 

wherein the statistically estimating a total unseen utiliza- 
tion \J US further comprises: 

creating a plurality of buckets with m rows and n 
columns, wherein n is a maximum number of 
samples in the set of sampled data points for any 
particular process, wherein m is a multiple of n, and 
wherein the buckets are ordered from zero to m-1; 

placing each seen process into one of the plurality of 
buckets, wherein the bucket is labeled according to 
the following equation: 

m 

(/-!)-+/. 



wherein t is a total quantity of samples in the set of 
sampled data points for this process, wherein i 
indicates one of 



equal divisions of the sample interval A such that 



i = 0, 1, 



and 



estimating a total quantity of unseen processes for 
each of 
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length segments of the sample interval A, comprising: 

counting a total quantity f ( . of processes of the 
greatest length segment contained in the 
highest-numbered bucket which contains at 
least one process; 
multiplying f f by m; 

determining a fraction of mxf,- which are unseen 

processes; and 
iteratively estimating a total quantity of unseen 

processes for each lesser length segment of the 

sample interval A, comprising: 

counting an initial quantity of processes of the 

next lesser length segment contained in the 

next lower-numbered bucket; 

calculating a difference of the initial quantity 

and a fraction of previously calculated longer 

processes; 

calculating a product of the difference and m; 
and 

determining a fraction of the product which are 
unseen processes. 

66. The memory medium of claim 49, 

wherein the program instructions further implement: 
determining a length distribution of the unseen pro- 
cesses of a greatest length, comprising multiplying 
the total unseen utilization by a first coefficient; 
and 

determining a length distribution of the unseen pro- 
cesses of a lesser length, comprising multiplying the 
total unseen utilization by a second coefficient; 

wherein the first coefficient and second coefficient are 
derived from an iterative method, wherein the itera- 
tive method comprises: 
creating a plurality of buckets; 
placing each seen process into one of the plurality of 
buckets; and 

estimating a total quantity of unseen processes for 
each of a plurality of equal length segments of the 
sample interval A, wherein each length segment 
corresponds to a bucket. 

67. The memory medium of claim 49, 

wherein the memory comprises a registry of metrics. 

68. The memory medium of claim 49, 

wherein the collecting a set of raw data points, the storing 
the set of raw data points in a memory, and sampling 
the memory are performed continually and repetitively 
over the measurement interval. 

69. The memory medium of claim 49, 

wherein the collecting a set of raw data points is per- 
formed a plurality of times at a collecting frequency; 

wherein the sampling the memory is performed a plurality 
of times at a sampling frequency; 

wherein the sampling frequency is less than the collecting 
frequency. 

70. The memory medium of claim 49, 

wherein the collecting a set of raw data points, the storing 
the set of raw data points in a memory, the sampling the 
memory, the statistically estimating a total uncaptured 
utilization U„ c , and the statistically estimating a total 
unseen utilization are performed on a single com- 
puter system. 
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71. The memory medium of claim 49, 

wherein the collecting a set of raw data points is per- 
formed on a different computer system than the statis- 
tically estimating a total uncaptured utilization U MC and 
the statistically estimating a total unseen utilization 

72. The memory medium of claim 49, 

wherein the programs instructions further implement 
modifying a model of the computer system based on 
the statistically estimating a total uncaptured utilization 
XJ UC and the statistically estimating a total unseen 
utilization \J US . 

73. The memory medium of claim 49, 

wherein the programs instructions further implement 
altering a configuration of the computer system based 
on the statistically estimating a total uncaptured utili- 
zation U uc and the statistically estimating a total unseen 
utilization U MJ . 

74. A memory medium which stores program instructions 
for monitoring the state of a computer system, wherein the 
program instructions are executable to implement: 

collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
one or more processes on the computer system; 

storing the set of raw data points in a memory; 

sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 
cesses which are included in the set of sampled data 
points are seen processes and processes which are not 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 
for each seen process; 

statistically estimating a total uncaptured utilization U uc , 
wherein the total uncaptured utilization is an estimation 
of a total length of unsampled segments for the seen 
processes of the one or more processes over the mea- 
surement interval, comprising: 

determining a set d of process service time 
distributions, wherein each process service time dis- 
tribution j estimates a duration of one or more 
processes, wherein l^j^d; 

for each process service time distribution j, determining 
a quantity n cpj - of seen processes which follow that 
process service time distribution j; 

for each process service time distribution j, determining 
a mean residual time r y - for that process service time 
distribution j, wherein the mean residual time esti- 
mates a length of an uncaptured residual segment for 
each seen process which follows that process service 
time distribution j; and 

determining the total uncaptured utilization M uo accord- 
ing to the following equation: 
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statistically estimating a total unseen utilization U^, 
wherein the total unseen utilization is an estimation 
of a total length of the unseen processes of the one 
or more processes over the measurement interval; 
and 
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modifying a model of the computer system based on 
the statistically estimating a total uncaptured utiliza- 
tion XJ UC and the statistically estimating a total unseen 
utilization U^. 
75. A memory medium which stores program instructions 
for monitoring the state of a computer system, wherein the 
program instructions are executable to implement: 

collecting a set of raw data points over a measurement 
interval L, wherein the set of raw data points relates to 
one or more processes on the computer system; 
storing the set of raw data points in a memory; 
sampling the memory repetitively at a sample interval A 
to create a set of sampled data points, wherein pro- 
cesses which are included in the set of sampled data 
points are seen processes and processes which are not 
included in the set of sampled data points are unseen 
processes, and wherein the set of sampled data points 
includes a first sampling time and a last sampling time 
for each seen process; 

statistically estimating a total uncaptured utilization U MC , 
wherein the total uncaptured utilization is an estimation 
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of a total length of unsampled segments for the seen 
processes over the measurement interval; and 

statistically estimating a total unseen utilization U,^, 
wherein the total unseen utilization is an estimation of 
a total length of the unseen processes over the mea- 
surement interval, comprising: 
creating a plurality of buckets; 

placing each seen process into one of the plurality of 
buckets; 

estimating a total quantity of unseen processes for each 
of a plurality of equal length segments of the sample 
interval A, wherein each segment corresponds to a 
bucket; and 

modifying a model of the computer system based on 
the statistically estimating a total uncaptured utiliza- 
tion U^. and the statistically estimating a total unseen 
utilization U„ c . 
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